Datamining and Disclosure Limitation for Categorical Statistical Databases
نویسنده
چکیده
There are many distinctions between statistical research databases and those arising in commercial or administrative settings, and thus different issues regarding confidentiality and privacy protection on the one hand and and data access and the use of databases on the other. Data integration across multiple databases raises issues in both domains, especially with regard to protection against intruders. This paper highlights some methods developed to limit possible disclosure of confidential information from statistical databases while at the same time publicly releasing sufficient information to allow users, whether dataminers or other more traditional statistical analysts, sufficient data to reach proper statistical conclusions from their analyses. The disclosure limitation tools discussed include: data perturbation and simulation, partial releases, and sampling, with a special focus on partial release of data from multidimensional cross-classifications or contingency tables.
منابع مشابه
Local synthesis for disclosure limitation that satisfies probabilistic k-anonymity criterion
Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthesizing the original data records in such a way as to minimize the risk of disclosure of the sensi...
متن کاملA Risk-Utility Framework for Categorical Data Swapping
Data swapping is a statistical disclosure limitation method used to protect the confidentiality of data by interchanging variable values between records. We propose a risk-utility framework for selecting an optimal swapped data release when considering several swap variables and multiple swap rates. Risk and utility values associated with each such swapped data file are traded off along a front...
متن کاملAssessing the Risk of Disclosure of Confidential Categorical Data
Disclosure limitation involves the application of statistical tools to limit the identification of information on individuals (and enterprises) included as part of statistical data bases such as censuses and sample surveys. We outline the major issues involved in assessing disclosure risk and assuring the protection of confidentiality for data bases, especially those in the form of multi-way co...
متن کاملStatistical perspectives on confidentiality and data access in public health.
Confidentiality and disclosure limitation are topics that are inherently statistical but, until recently, they have received limited attention from statistical methodologists. That situation has changed considerably in the present decade. In this paper, we provide an introduction and overview of some statistical disclosure limitation issues that are of special relevance to public health studies...
متن کاملAdvances in Inference Control in Statistical Databases: An Overview
Inference control in statistical databases is a discipline with several other names, such as statistical disclosure control, statistical disclosure limitation, or statistical database protection. Regardless of the name used, current work in this very active eld is rooted in the work that was started on statistical database protection in the 70s and 80s. Massive production of computerized statis...
متن کامل